-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
add rtruncate, ltruncate, ctruncate for truncating strings
#55351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add rtruncate, ltruncate, ctruncate for truncating strings
#55351
Conversation
|
Isn't this font and terminal emulator dependent? (e.g. 🏳️🌈 is 1 wide in a browser but in my terminal emulator, displaying as 🏳️🌈) |
|
I'm far from an expert on this, but isn't that lack of adherence to the standard by the terminal, not variability that should be handled? |
In any case, |
no. Quoting https://www.unicode.org/emoji/charts/emoji-zwj-sequences.html:
|
|
These are adjacent functions to |
f13dcb6 to
3f012b2
Compare
|
(It definitely depends on the terminal; see the classic issue #3721 — but alternatives like |
|
Do we really need all three of these in the stdlib? |
|
I see two arguments for this functionality finding its way into
With regards to why While cpad(str, w) = lpad(rpad(str, w ÷ 2), w)and this was given as the reason why Unlike |
|
Oh also, I've just discovered a generalised truncation function I implemented in function struncate(str::AbstractString, maxwidth::Int, joiner::AbstractString = "…", mode::Symbol = :center)
textwidth(str) <= maxwidth && return str
left, right = firstindex(str), lastindex(str)
width = textwidth(joiner)
while true
if mode ∈ (:right, :center)
(width += textwidth(str[left])) <= maxwidth || break
left = nextind(str, left)
end
if mode ∈ (:left, :center) && width < maxwidth
(width += textwidth(str[right])) <= maxwidth || break
right = prevind(str, right)
end
end
str[begin:prevind(str, left)] * joiner * str[nextind(str, right):end]
end |
|
A single function like that seems tidier, but is it too atypical to put key functionality on a kwarg? Otherwise we might have had |
We could always use a single-function implementation internally (maybe with a |
3f012b2 to
62bb2d6
Compare
|
Indeed. Just updated to the single internal implementation |
62bb2d6 to
8436ea3
Compare
base/strings/util.jl
Outdated
| ctrunc(str::AbstractString, maxwidth::Int, replacement::Union{AbstractString,Char} = '…'; prefer_left::Bool = true) = | ||
| struncate(str, maxwidth, replacement, :center, prefer_left) | ||
|
|
||
| function struncate(str::AbstractString, maxwidth::Int, replacement::Union{AbstractString,Char}, mode::Symbol = :center, prefer_left::Bool = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| function struncate(str::AbstractString, maxwidth::Int, replacement::Union{AbstractString,Char}, mode::Symbol = :center, prefer_left::Bool = true) | |
| function _struncate(str::AbstractString, maxwidth::Int, replacement::Union{AbstractString,Char}, ::Val{mode}, prefer_left::Bool = true) where {mode} |
use an underscore to indicate that this is an internal method, and use Val so that the mode checks can be elided by the compiler?
(Especially since the mode checks occur in the inner loop.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have public we don't need to mark internals by name convention?
Val sounds good. I didn't appreciate why before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why Val and not e.g. a Symbol keyword arg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have
publicwe don't need to mark internals by name convention?
because the REPL autocompletes and even gives hints for private names: #51327
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That change just needs implementing, right? That can happen in time for 1.12 along with this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to make sure that the mode == ... conditionals are evaluated at compile time.
This should be backed up by some benchmark in my opinion. A single pointer equality check seems to me insignificant from the work of the function.
In principle this could also happen with enum or Symbol arguments if we ensure that the function is inlined.
I don't think constant propagation requires inlining. And there is also @constprop :aggressive.
Val is never really a good idea to make an API upon (any longer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be backed up by some benchmark in my opinion. A single pointer equality check seems to me insignificant from the work of the function.
I tried this with
s = randstring(10^5)
@btime struncate($s, 100, '…', :center);and I got about an 8% speedup by hard-coding the mode checks. (This is after doing the total_width optimization noted below — otherwise, it spends all its time on the textwidth(str) check.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get another 60% speedup by replacing textwidth(::Char) with an optimized version that has a fast path for ASCII characters. We might want to do this in a separate PR:
const textwidth_ascii = Cint[textwidth(Char(i)) for i = 0:127]
function textwidth2(c::AbstractChar)
b = bswap(reinterpret(UInt32, c))
b < 0x80 && @inbounds return Int(textwidth_ascii[b+1]) # ASCII fast path
Base.ismalformed(c) && return 1
Int(ccall(:utf8proc_charwidth, Cint, (UInt32,), c))
endUpdate: this optimization is #55398
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Val I do see reduction from
julia> @b ctrunc("dsadsafsadsadsafdsad", 10)
150.062 ns (4 allocs: 112 bytes)
to
136.986 ns (4 allocs: 112 bytes)
That's without #55398
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After #55398 this is now
julia> @b ctruncate("dsadsafsadsadsafdsad", 10)
91.665 ns (4 allocs: 112 bytes)
|
Could the exported functions get more descriptive names? E.g., |
|
@nsajko these are supposed to be the counterparts to to |
|
IMO, given that So, IMO, in order of preference:
FWIW. |
|
Ok. Fair.
|
base/exports.jl
Outdated
| rpad, | ||
| rsplit, | ||
| rstrip, | ||
| rtrunc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nsajko Regarding renaming. I'm wavering. We have rpad rsplit rstrip, lpad lstrip.
Moving to truncate_right etc. feels a bit awkward for something so closely tied.
What about rtruncate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, rtruncate is much better, I vote for rtruncate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do generally like unabbreviated names (nextind continues to annoy me).
However... I do wonder if there's a danger in ltruncate (& co.) being very similar in name to truncate yet doing something completely different?
Co-Authored-By: Timothy <[email protected]>
Co-Authored-By: Steven G. Johnson <[email protected]> Co-Authored-By: Timothy <[email protected]>
Co-Authored-By: Steven G. Johnson <[email protected]>
Co-Authored-By: Steven G. Johnson <[email protected]>
6c222fe to
bc9e780
Compare
rtrunc, ltrunc, ctrunc for truncating stringsrtruncate, ltruncate, ctruncate for truncating strings
|
Not to open a can of worms, but it occurs to me that in cases like: julia> struncate("🍕🍕🍕🍕🍕🍕xxxxxxxxxxx", 9, "…")
"🍕🍕🍕…xx"The "centre" truncate call has 6 cells of (Sorry Ian 😅) |
|
Just pushed a fix |
…erms of `textwidth` (JuliaLang#55351) Co-authored-by: Timothy <[email protected]> Co-authored-by: Steven G. Johnson <[email protected]>
Co-authored with @tecosaur
Looking at the implementation of these functions in
Profile(which are touched in #55335) uncovered issues and it seems to make sense to make and test the proper functions and add them to Base.